arXiv: 1505.02570V1 [math.ST] 11 May 2015 


AN ASYMPTOTIC LINEAR REPRESENTATION FOR THE BRESLOW ESTIMATOR 


Hendrik P. Lopuhaa and Gabriela F. Nane 

Department of Applied Mathematics 

Delft University of Technology 

Mekelweg 4, 2628 CD, Delft, The Netherlands 

G. F. N ane@t ndelft. nl 

Key Words: Cox model; asymptotics; empirical processes. 

Mathematics Snbject Classihcation: 62G20, 62G05, 62N02. 

ABSTRAGT 

We provide an asymptotic linear representation for the Breslow estimator of the baseline 
cnmulative hazard fnnction in the Gox model. Our representation consists of an average of 
independent random variables and a term involving the difference between the maximum 
partial likelihood estimator and the underlying regression parameter. The order of the 
remainder term is arbitrarily close to n~^. 

1. INTRODUGTION 

The proportional hazards model is one of the most popular approaches to model right- 
censored time to event data in the presence of covariates. Gox (1972) introduced this semi- 
parametric model and focused on estimating the underlying regression coefficients of the 
covariates. His estimator was later shown (Gox, 1975) to be a maximum partial likelihood 
estimator and its asymptotic properties were broadly studied (Tsiatis, 1981; Andersen et 
ah, 1993; Oakes, 1977; Sind, 1982). Different functionals of the lifetime distribution are 
commonly investigated and the (cumulative) hazard function is of particular interest. In the 
discussion following the Gox’s (1972) paper, Breslow proposed a nonparametric maximum 
likelihood estimator for the baseline cumulative hazard function. Asymptotic properties of 
the Breslow estimator, such as consistency and the asymptotic distribution, were derived by 
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Tsiatis (1981) and Andersen et al. (1993). For an overview of the Breslow estimator, see 
Lin (2007). 

Estimators in unconditional censorship models such as the Kaplan-Meier and Nelson- 
Aalen estimators have received considerable attention, especially in the 1980s. Established 
large sample properties include consistency and asymptotic normality (Breslow and Crowley, 
1974), rate of strong uniform consistency (Csorgo and Horvath, 1983), strong approximation 
or Hungarian embedding (Burke et ah, 1981), and linearization results (Lo and Singh, 1985). 
Lo and Singh (1985) expressed the difference between the Kaplan-Meier estimator and the 
underlying distribution function in terms of a sum of independent identically distributed 
random variables, almost surely, with a remainder term of the order n“^/^(logn)^/^, with n 
denoting the sample size; this rate was later improved to n~^ logn by Lo et al. (1989). To 
our knowledge, a strong approximation result for the Breslow estimator is unavailable in the 
literature. Kosorok (2008) establishes a representation of the Breslow estimator in terms of 
counting processes. Although this can be turned into an asymptotic linear representation 
similar to the one in Lo and Singh (1985), the covariates are assumed to be in a bounded 
set and the remainder term is only shown to be of the order Op(n“^/^). 

In this paper, we derive a similar linearization result for the Breslow estimator, i.e., we 
prove that the difference between the estimator A„ and the cumulative baseline hazard func¬ 
tion Aq can be represented as a sum of independent random variables and a term involving 
the difference between the regression parameter and its maximum partial likelihood esti¬ 
mator. However, we allow unbounded covariates and we show that the remainder term is 
of the order where may be any sequence tending to zero. As a„ can be chosen 

to converge to zero arbitrarily slowly, this means that the order of the remainder term is 
arbitrarily close to n~^. The proof is based on empirical process theory, which allows the ex¬ 
tension of our result to related semi-parametric models, such as marginal regression models. 
Our main motivation is isotonic estimation of the baseline distribution in the Cox model. An 
example is the Grenander type estimator A„ for an increasing baseline hazard Aq, considered 
in Lopuhaa and Nane (2013), which is dehned as the left-hand slope of the greatest convex 
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minorant of the Breslow estimator. The limit behavior of A„ at a hxed point to essentially 
follows from the limit behavior of the process 

t H- {(A„ - Ao) (to + n“^/^t) - (A„ - Aq) (to)} • 

In the absence of a strong approximation result for the process A„ — Ao, an alternative to 
obtain the limit process is to apply the results in Kim and Pollard (1990) to the linear 
representation for A„ — Ao, provided that the remaining terms in the representation are 
of order smaller than This cannot be ensured by the representation in Kosorok 

(2008), whereas the order n~^a~^ can be chosen sufficiently small, for suitable choices of a„. 
Another application of our linear representation is that, together with a linear representation 
for the maximum partial likelihood estimator, a central limit theorem can be established for 
A„ — Aq. Moreover, such a representation may also provide a means to estimate the variance 
of the Breslow estimator, by using plug-in estimators. A linear representation for the partial 
maximum likelihood estimator can be deduced from results in Tsiatis (1981) or Kosorok 
(2008). 

The paper is organized as follows. The Cox model and the Breslow estimator are intro¬ 
duced in Section 2. Section 3 is devoted to the main result of the paper and its proof as well 
as to preparatory lemmas. 

2. BACKGROUND, NOTATION, AND ASSUMPTIONS 

Let X denote a positive random variable representing the survival time of a population 
of interest. The random variable C denotes the censoring time. Now, dehne T = min(X, C) 
as the generic follow-up time and A = {X < C} as its corresponding indicator, where {■} 
denotes the indicator function. Suppose that at the beginning of the study, extra information 
such as sex, age, status of a disease, etc. is recorded for each subject as covariates. Let 
Z denote a p-dimensional covariate vector. Therefore, suppose we observe the following 
independent, identically distributed triplets (Ti, Ai, Zi), with i = l,...,n. The censoring 
mechanism is assumed to be non-informative. Moreover, given the covariate Z, the survival 
time X is assumed to be independent of the censoring time C. The p-dimensional covariate 
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vector Z is assumed to be time invariant and non-degenerate. 

In the Cox model, the distribution of the survival time is related to the corresponding 
covariate by 

X{x \ z) = Ao(a;) e^o^, x G R’*', 

where \{x \ z) is the hazard function for a subject with covariate vector 2 ; G Aq repre¬ 
sents the underlying baseline hazard function, and /Sq G IR^ is the vector of the underlying 
regression coefficients. Conditionally on Z = z, the survival time X is assumed to be a 
nonnegative random variable, with an absolutely continuous distribution function F{x \ z) 
with density f{x \ z). The same assumptions hold for the censoring variable C and its 
distribution function G. Let H be the distribution function of the follow-up time T and 
let th = inf{t : H{t) = 1} be the end point of the support of H. Moreover, let Tp and tq be 
the end points of the support of F and G, respectively. We employ the usual assumptions 
for deriving large sample properties of Cox proportional hazards estimators (Tsiatis, 1981): 


(Al) Th = tg < Tp. 


(A2) There exists e > 0 such that 


where 


sup E 

|^-/3o|<e 



< 00, 


denotes the Euclidean norm. 


Let X(i) < • • • < denote the ordered, observed survival times. Cox (1972, 1975) 

introduced the proportional hazards model and proposed the partial likelihood estimator /3 
as an estimator for the underlying regression coefficients /Sq. Breslow (Cox, 1972) focused on 
estimating the baseline cumulative hazard function, Ao(a;) = Xo(u) du, and proposed 


Ar. 


X = 


Z 


dj 








( 1 ) 


as an estimator for Aq, where di is the number of events at X(j) and /3 is the partial maxi¬ 
mum likelihood estimator of the regression coefficients. The estimator A„ is most commonly 
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referred to as the Breslow estimator. Under the assumption of a piecewise constant baseline 
hazard function and assuming that all the censoring times are shifted to the preceding ob¬ 
served survival time, Breslow showed that the partial maximum likelihood estimator /3 along 
with the baseline cumulative hazard estimator A„ can be obtained by jointly maximizing the 
full loglikelihood function. 

Let 


<h(/5,a;) = J {u > x} dP{u,6, z), 
^nW, x) = j{u>x} e^'^ dPn{u, S, z), 


( 2 ) 


where P is the underlying probability measure corresponding to the distribution of (T, A, Z) 
and Pn is the empirical measure of the triplets (Tj,Aj,Zj), for i = 1,2,... ,n. Further¬ 
more, let H^^{x) = P(T < a:, A = 1) be the sub-distribution function of the uncensored 
observations. Then, using the derivations in Tsiatis (1981), it can be deduced that 


\q{u) 


dH'^'^iu) / du 

d)(/3o,M) 


( 3 ) 


Consequently, it can be derived that 


From (Al) it follows that Ao{th) < oo. An intuitive baseline cumulative hazard function 
estimator is obtained by replacing $ in (jl]) by and by plugging in /3, which yields exactly 
the Breslow estimator in (ITl) . 

^ dPn(u,S,z). (5) 

Kosorok (2008) established strong uniform consistency for the Breslow estimator and the 
process convergence of v^(A„ — Aq), yet under the strong assumption of bounded covari¬ 
ates. Using standard empirical processes methods, Lopuhaa and Nane (2013) established 
strong uniform consistency at rate for the Breslow estimator under the relatively mild 
conditions (Al) and (A2). 
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3. ASYMPTOTIC REPRESENTATION 


The following two lemmas will be used in proving the main result of the paper. 


LEMMA 1. Suppose that condition (A2) holds and let and $ be defined in ([2]). With 
e > 0 taken from (A2), for |/5 — < e, let 

dA') x) = = [{'^ > x} z dP(M, <5, z) G R,^, 

op J 

= = J{u>x}z(A'^dPn{u,6,z) gR". 

Then, 


Vnsup |$„(/?o,a;) - <h(/?o,a:)| = 0p(l), 

( 7 ) 

Vnsup |P>W(/3o,a;) - D^^\f3o,x)\ = 0p(l). 
xgr 

Proof. Consider the class of functions Q = {g{u,z;x) : x G M}, where, for each a; G M and 
fio ^ R^ fixed, 

g{u, z; x) = {u> x} exp(/?'z) 

is a product of an indicator and a fixed function. It follows that ^ is a Vapnik-Cervonenkis 
(VC)-subgraph class (Lemma 2.6.18 in van der Vaart and Wellner, 1996) and its envelope 
G = exp(/3gz) is square integrable under condition (A2). Standard results from empirical 
process theory (van der Vaart and Wellner, 1996) yield that the class of functions ^ is a 
Donsker class, i.e., 

\/n J g{u,z;x)d{Pn-P){u,5,z) = 0p{l), 

so that the first statement in ([7]) follows by the continuous mapping theorem. To prove the 
second statement, it suffices to consider each jth coordinate, for j = 1,.. . ,p, fixed. In this 
case, we deal with the class Qj = {gj{u, z; x) : x G M}, where 


gj{u, z] x) = {u> x}zj exp{(3Qz). 


From here the argument is exactly the same, which proves the lemma. 


□ 
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LEMMA 2. Assume (Al) and (A2). Then, for all M G (0, 

1 1 


a„n sup 
xe[o,M] 


S{u < x} 


^n{/3o,u) $(/ 3 o , m ) 


d{Pn-P){u,S,z) 


= 0p(l), 


for any sequence a„ = o(l). 


Proof. Consider the class of functions Pn = {fniu, z; x) : 0 < x < M}, where 

/I 1 

fniu, d, z: x) = 5{u < x| ^ ^ -r — —-r 

Correspondingly, consider the class Gn,M,a consisting of functions 

where d < y < M and is nonincreasing left continuous, such that 


d'(M) > K, sup |d'(n) — <h(/9o,w)| < a, 

ne[o,Ar] 

where K = <h(/3o,M)/2. Then, for any a > 0, we have P{Pn C Gn,M,a) —t 1, by Lemma [H 
Furthermore, the class Gn,M,a has envelope G{u,6,z) = Since the functions in Qn,M,a 

are products of indicators and a difference of bounded monotone functions, its entropy with 
bracketing satishes 

logiV[](£,^„,M,a,h2(P)) < p 

see e.g.. Theorem 2.7.5 in van der Vaart and Wellner (1996) and Lemma 9.25 in Kosorok 
(2008). Hence, for any 5 > 0, the bracketing integral 

= j + logA[](£||G'||2,^n,M,«,h2(P))d£ < CX). 

By Theorem 2.14.2 in van der Vaart and Wellner (1996), we have 


E 


n / g{u,6,z-,y,dl)d{Pn- P)iu,6,z) 


On,M,oc 


<J[]{l,Gn,M,»^L2{P))\\G\\p,2 = 0{a), 


where || ■ ||j- denotes the supremum over the class of functions P. Now, let = o(l). Then, 
according to o. 

a„\/rasup|‘J>„(/3o, z) - <I>(/?o, x)| ^ Op(l). 

xgR 
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Therefore, if we choose a = n this gives 


E 


g{u,5,z]y,^)(l{Pn - P){u,5,z) 


= &{{nan) 


Gn,M,a 

and hence, by the Markov inequality, this proves the lemma. □ 

The asymptotic linear representation of the Breslow estimator is provided by the next 
theorem. 


THEOREM 1. Assume (Al) and (A2). Let $ and D^^'> be defined in ([2]) and ([6]). Then, 
for all M E (0, th) and x E [0, M], 

1 ” 

An{x) - Ao{x) = - Ai, Zi, x) + {(3- (3 o)'Aq{x) + Rn{x), 

2=1 

where (3 is the maximum partial likelihood estimator, 

^0W = / -^7^5-w dn 8 

Jo *P(po,m) 


and 

i{t,5,z-,x) 

and Rn is such that 



Ao(m) 

^{fio,u) 


dn + 


6{t < x} 

^fio,t) 


sup |R„(a;)| = 0p(n 

x£[0,M] 

for any sequence a„ = o(l). 

Proof. For /? G R^, define 


= j dPn{u,5,z). 

Hence, the Breslow estimator in ([5|) can also be written as A„(/3, x). For x E [0, M], consider 
the following decomposition 


A„(a;) - Ao(a;) = T^fix) + Tn2{x), 


where Tnfix) = An{/3,x) - An{/3o,x) and Tn2{x) = An{/3o,x) - Ao(x). 










For the term T„i, first notice that a Taylor expansion of A„(-,x) around /So yields that 


An(/3,x) - A„(/?o,a;) = -(/S - I3q)'A n{x) + -(/S - l3o)'Rni{x){l3 - /?o), 

where the vector A„ and matrix Rni are given by 

D'n\fSa,n) 


( 9 ) 


An{x) = / 6{u < x}- 


^l{/3o,u) 


dPn{u,6, z), 


( 10 ) 


Rnl{x) = / d{u<x} - 1 - (lPn[u,d,z), 




for some \f3* — j3o\ < |/5 — /Sol, with Dn'’ as dehned in IQ and 

x) = ——= J {u > x} zz' dPn{u, d, z) G x 
We dehne Dl‘^l{f3,x) similarly, with replaced by P. 

According to (A2), we have |Zl^^)(/So, x)| < E [|Z| exp(/SQZ)] < oo, for all x G M, and similarly 


\D^y\Po,x)\ < - V|Zi|e^o^' [|Z|e^o^ 

n ^^ L 


< oo, 


2=1 


with probability one. Likewise, \D‘A>{I3q^x)\ < oo and 


\Dyy\i 3 o,x)\ < -V^E [|z|Vo^ 

n L 

2=1 

with probability one. Furthermore, for all x G [0, M], 


< oo. 


0 < <F(/So, M) < $(/So, x) < <F(/So, 0) = E 




< oo 


and <h„(/?o,Af) < <F„(/3o,x) < <h„(/?o,0), where <F„(/3o,M) -)■ ^{/3q,M) and <F„(/?o,0) -)■ 
$(/So,0), with probability one. It follows that there exist constants Ki,K 2 > 0, such that 
for all X G [0, M], 


|p(i)(/So,x)| < K 2 , \D^^^\/3o,x)\ < K 2 , K, < <F(/So,x) < 


( 11 ) 


and for n sufficiently large. 


|P«(/So,x)| < iF 2 , \D^n\Po.x)\ < K 2 , K, < «F„(/So,x) < iF 2 , (12) 
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with probability one. According to (jS]), 


SO that Aq, as dehned in ([8]), is eqnal to 


dP{u, d, y) = ——-- = Ao(m) du, 


^{/3o,u) 


(13) 


Ao{x) = [ 6{u < dP{u,6,z) G 

J ^APo,u) 

Then, for the An term in ([9]), it can be deduced that 

Di^\(3o,u) D(^\(3o,u) 


sup |A„(a;) — Ao(x)| < sup 

0<x<M 0<u<M 


sup 

0<x<M 


d>lif3o,u) 


5{u < x}- 


By ffTTl) and flT^ . the hrst term on the right hand side is bounded by 


d{Pn-P){u,5,z) 


1 2K^ 

-p 2 sup |<l)„(/3o,a:) - <h(/3o,x)|, 

■^1 Q<x<M -^1 0<a:<M 


which is of the order by Lemma [TJ For the second term on the right hand side, 

for each j = 1, ... ,p, hxed, consider the class Qj = {gj{u,5]x) : x G [0,M]}, consisting of 
functions 

gj{u, 6; x) = 5{u < x} ’ 

where denotes the jth coordinate of D^d). Now, each gj{u,5;x) is the product of 
indicators and a fixed uniformly bounded function. Standard results from empirical process 
theory (van der Vaart and Wellner, 1996) give that the class Qj is Donsker. As in the proof 
of Lemma [H, we hnd that for every j = 1,... ,p. 


n sup 

0<x<M 


gj{u, 6; x) d{Pn - P){u, 6, z) 


= 0p(l)- 


It follows that 


sup \An{x) - Ao(x)| = 0p(n 0^). 

0<x<M 


and we can conclude that 


0 - I3q)' An{x) = 0 - I3q)'Aq{x) + Rn 2 {x), 
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where -R„ 2 (x) = 0p(n ^), uniformly for x G [0,M], since (3 — Pq = 0p{n (Tsiatis, 1981). 
For the term containing R^i, hrst observe that, according to fll2p . for n sufficiently large, 


sup 

Mg[0,M] 


2D^r^\(3% u)D^n\(3% u)' - D^n\(3\ u)^n{l3% u) 


almost surely, so that 


0 ( 1 ), 


sup 

0<x<M 


- f^o)'Rni{.x){{3 - l3o) 


0p{n ^). 


Concluding, 

Tni{x) = 0 - I3 o)'Aq{x) + 0p(n“^), (14) 


uniformly in a: G [0, M], Proceeding with T„ 2 , write 


Tn 2 {,x) = Kn{l3o,x) - Ko{x) = Bn{x) + Cn(x) + RnS^x) + Rn4{x), 


where 


Bn{x)= [ Hu < dP(u,h, z), 


CJx) = 


6{u < x} 
^(/^o,m) 

Rnz{x) = I 3{u< x} 


d{Pn-P){u,6,z), 
1 1 


<F„(/3o,m) *h(/3o,M) 


d(P„-P)(u,5,z), 


R^^{x)= [ < x} X dP(u, h, z). 


$2(/?o,u)<Fn(/3o,M 

For the dominating term in T„ 2 , we can write 

Bn{x) + Cn{x) = — [ 6{u < x} dP(M, 5, z) + 


1 " 

-^i{Th^i,Zi-x), 

r) < ^ 


6{u < x} 

<F(/3o,m) 


dP„(M,(5, z) 


i=l 


where 

^{t,6,z-,x) = - 
Using flT5]l . we conclude that 


7{m < x} 


{t > uje^o^ 
^^Wo,u) 


dP{u,-f,y) + 


6{t < x} 

■ 


^{t,6,z;x) 


r" Ao(u) h{t < X} 

io <l>(/3o,u) ^ <F(/3o,t) • 
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For the remainder terms, it follows by Lemma [21 that for any sequence a„ = o(l), 


sup \Rn 3 {x)\ = 0p{n (15) 

0<x<M 


To treat i?„ 4 , note that 


Rn4:(^x'^\ ^ 


1 1 
d)2(/3o,M) ^n{/3o,M) 


sup l^riiPoyX) - <F(/3o,a;)P, 

xgR 


SO that by (|7]) and flT^ . 


sup |-R„ 4 (a^)| = 0p{n ^). 

0<X<M 

Together with fITT)) and flT^ . this proves the theorem. □ 


In the special case of no covariates, i.e., /do = /3 = 0, it follows that 


^{/3o,x) = 1 - H{x) 


and 




= — 


Ao(u) ,^^^Ht<x} 


fxAt 


dH'^^u) 6{t < x} 


+ 


/o [l-H{u)]^ 


Jo ^{Jo,u) <h(/3o,t) 

This means that Theorem [1] retrieves a result similar to Lemma 2.1 in Lo et al. (1989). 

The rate at which the error term tends to zero becomes faster as a„ tends to zero more 
slowly. If Qn = 1/logn, we obtain the same rate as the error term in Lemma 2.1 in Lo et 
al. (1989). However, they obtain the order 0(?7,“^logn) almost surely, whereas Theorem [H 
with the choice a„ = 1/logn, only provides this order in probability. Also, the sequence a„ 
may be chosen to converge to zero arbitrarily slowly. This means that the order 0p{n~^a~^) 
of Rn is arbitrarily close to 0p(n“^). 

Using a linear representation for /3 — /do, a full linearization for the Breslow estimator 
can be obtained. Such a linear representation can be deduced from the proof of Theorem 
3.2 in Tsiatis (1981) or from an application of Theorem 2.11 in Kosorok (2008); see also 
Section 4.2.1 in Kosorok (2008). As a consequence. Theorem 1 together with the expansion 
of /d — /do can be used to establish a central limit theorem for the Breslow estimator, as well 
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as to estimate the limiting covariance structure, by using plug-in estimators. For example, 
the term Aq in the linear expression can be estimated consistently by in fllOp . 
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